The Influence of Coronavirus and Influenza Vaccinations

Analysis of Vaccination, Mortality and Prevalence Statistics, 2019-2020

Magdalene Mlynek

Introduction

When Coronavirus (SARS-CoV2) quickly spread throughout the world and was declared a pandemic in early 2020, scientists and public health officials began searching for the best ways to minimize the risk of transmission, serious illness and death. It quickly became clear that the safest and best option to achieve immunity is through a vaccine. Biotech and Pharmeacuital companies raced to develop, produce and distribute a vaccine. While vaccines typically take 5 to 10 years to be developed and produced, the first vaccines (Pfizer-BioNTech and Moderna) were developed, produced and authorized for emergency use in the United States all in a span of just 9 months.

Shortly after, many leaders encouraged people to get the vaccine and even created campaigns and incentives for those who agree to get the jab. Since then, many countries and buinesses have required proof of vaccination status.

The speed at which these vaccines were developed and then required brought many concerns over their efficacy and safety, as well as ethical and health privacy concerns. With the COVID vaccines approved within weeks rather than years, many of the standard vaccine approval protocols were completed and these vaccines were approved for emergency use by the US Food and Drug Administration. One main concern is the limited knowledge about the long term effects of m-RNA vaccine technology, which may "generate strong type I interferon responses that could lead to inflammation and autoimmune conditions" (Wibawa 2020). Although adverse reactions have only occured in about 0.005% of vaccinations, their risk is another significant concern to people, including anapyhlaxis and Thrombosis with thrombocytopenia syndrome (TTS).

Today, about 60% of the world's population has chosen to recieve at least one dose of the COVID-19 vaccine. The motivation to make this decision can come from a variety of reasons, including from public campaigns and government incentives promoting the safety of the vaccine. For example, many US states were holding random drawings to win cash, sholarships, and tickets to nearby attractions for those who get the vaccine. In addition, many celebrities were paid to promote the vaccine through social media. While these incentives may convince some people to get the vaccine, it is a very costly way encourage vaccination. Perhaps there is another way to convince people of the efficacy and safety of the vaccine?

Two possible "vaccine motivators" are the number of cases and deaths from COVID which can create a sense of fear of contracting the virus and a sense of urgency to protect oneself from the potentially high number of positive cases in their community. More specifically, if a community is seeing a high number of cases or deaths, it is likely that other community members will decide to get the vaccine simply because there are positive cases in their community. This project will investigate the possible association between Coronavirus mortality, prevalence and vaccination rates. In addition, this project will determine if this association is similar to that of Influenza, or if the Coronavirus vaccination bevaviours and outlook are unique.

In summary, the main research goal of this project is to determine the association between the vaccination rates and prevalence and mortality statistics, and more specifically if these relationships are different between Covid and Influenza. This will help us to determine if a population is motivated moreso by disease prevalence or mortality in getting a vaccine.

Data Aquisition

A number of datasets were utilized in this analysis. Coronavirus and Influenza vaccination rates were provided on the CDC COVIDVaxView and FluVaxView respectively, and were aquired using API. A description of these datasets is shown below:

Coronavirus prevalence and mortality rates were found on the CDC COVID Data Tracker site (https://covid.cdc.gov/covid-data-tracker/#datatracker-home) and downloaded as .csv files.The Influenza Mortality and Prevalence datasets were downloaded as .csv files from the CDC Flu Activity and Surviellance site. Here, prevalence is defined as the proportion of the state population who have had the disease within a defined time period. Mortality is defined as the number of people (out of 100,000 population) who have died from the disease also within a defined time period. A description of the datasets used to measure disease prevalence and mortality across both Coronavirus and Influenze are provided below:

Data Cleaning and Shaping

In order to clean and prepare the dataset for analysis, I utilitzed a variety of functions to reshape the data into the necessary format. The functions I most commonly used were groupby, loc, dropna, concat and merge. It should be noted that in both the Flu and Coronavirus Prevalence datasets, New York and New York City are both listed. I used a weighted average to determine the prevalences of the diseases in the state of NY where the weights were the proportions of NY population living in each (43% live in NYC, 57% live elsewhere in New York. I should also note that the data of a few states were missing. Another dataset containing the State names and codes was merged to the final dataset in order to create the maps, which required the state code rather than the state name. In addition, the influenza prevalence rates are multiplied by 1000 to achieve the standard 100,000 population.

Visualisation

This analysis utilized a wide variety of plots, including interactive maps using Plotly.

I. Density Plots

First, we will observe the density plots of vaccination rates across the many subpopulations.

Influenza:

Coronavirus:

II. Hypothesis Testing

We will now use ANOVA test to determine if there is a statistically significant difference between the influenza vaccination rates of the eight groups (null hypothesis is that the vaccination rates of all eight subpopulations are equal, alternative hypothesis is that at least one is different from the others).

From the above output, we can see that the influenza vaccination rate is different for at least one of the eight subpopulations, as the p-value is less than 0.05. Now we will investigate which pairs of subpopulations are different from one another.

We can see that nearly all flu vaccine subpopulations are statistically different from one another. The pairs of subgroups that have no statistically significant vaccination rates are the following:

Similarly, we will investigate if there is a statisitcally significant difference between the coronavirus vaccination rates of the three age groups.

We can see that there is a statistical difference between the coronavirus vaccination rates of at least two of the subpopulations. Now we will run a series of hypothesis tests to determine which pair of groups are different.

In the output above, we can conclude that each of the subpopulations are statistically different from the others.

III. Correlation between Vaccination Rates, Prevalence and Mortality Rates

To further investigate these statistical differences previously found, we can find the correlations between Coronavirus and Influenza Vaccination rates, and Prevalence and Mortality. These values will allow us to determine if dependencies exists between any of these three variables. The correlations are displayed in the plots below. The first plot (heatmap) shows the correlations between all pairs of Flu Prevalence and Mortality, and Covid Prevalence and Mortality.

Here we can see that there is a possible association between coronavirus prevalence and mortality, where the correlation is equal to 0.37. We can investigate this relationship in further detail by incorporating vaccination rates. Scatterplots showing the relationship between Prevalence, Mortality and Vaccination rates by group are also shown below.

From the plots above, we can see that the relationship between coronavirus vaccination vs prevalence (plot 1) and coronavirus vaccination vs mortality (plot 2) are different for the three age groups, as their lines of best fit are parallel. However, the oldest age group has a larger mean (intercept) than the middle age group, which has a larger mean than the youngest age group. This means that a change in mortality is associated with a similar average change in vaccination rates regardless of age, however the overall average vaccination rates are different between the groups. These plots confirm the results from the previous ANOVA test. We can also notice that there does not appear to be much change in vaccination rate as mortality increases. However, as covid prevalence increases, vaccination rates appear to decrease. Though this finding may be expected, because when people test positive and gain immunity against coronavirus, they may not see the need to get the vaccine. There appears to be no difference in the relationship between prevalence and mortality within the three groups.

Now, I will complete the same analysis for Influenza.

The plots above display a similar relationship that we saw in coronavirus. While there appears to be some overlap between some of the subpopulations, the lines of best fit appear to be mostly parallel. We can notice that healthcare workers have the highest vaccination rates regardless of influenza prevalence and mortality. Again, there does not appear to be any significant difference in the relationship between prevalence and mortality within the three subpopulations.

IV. Geographic Data Visualisation

Below are a series of interactive maps made using plotly showing the prevalence, mortality and vaccination rates of coronavirus and influenza by state.

We can also compute the ratio of Coronavirus to Influenza prevalence rates to determine states that have a large difference in the number of cases of the two diseases, indicating that the diseases may spread throughout the population differently. Similarly, we can compute the mortality ratios.

We can notice that Delaware has a large prevalence ratio, indicating that while Coronavirus quickly infects much of the population, this is not the case for influenza.

From this map, we can see that in New Jersey, the age-adjusted mortality rate of coronavirus is 8.5 times larger than that of Influenza.

Data Modeling

Now, we will investigate how COVID vaccination rates are explained by the disease prevalence and mortality rate within each subpopulation, using simple linear regression. This same analysis will also be completed on the influenza vaccination rates. Then, we can compare these models and determine how the vaccination rates of different populations are associated with prevalence and mortality of the disease.

I. Coronavirus vaccination rates as explained by prevalence and mortality

Model 1: $\hat{{CovidVax}_{18-49 years}} = \hat{\beta}_{1}X_{covidprevalence} + \hat{\beta}_{2}X_{covidmortality}$

Model 2: $\hat{{CovidVax}_{50-64 years}} = \hat{\beta}_{1}X_{covidprevalence} + \hat{\beta}_{2}X_{covidmortality}$

Model 3: $\hat{{CovidVax}_{65+ years}} = \hat{\beta}_{1}X_{covidprevalence} + \hat{\beta}_{2}X_{covidmortality}$

Coefficients:

P-values of Coefficients:

P-value for Overall fit:

R-squared values:

We can see that within all three subpopulations, prevalence is a statistically significant predictor for vaccination rate, while mortality is not. From the coefficients, we can interpret that as covid prevalence increases 1000 cases per 100,000 population, on average we would expect the proportion of people vaccinated to increase by 1.8% in the 18-49 year old category, 2.7% in the 50-64 year old category, and 3.3% in the 65+ year old category. All three models have a relatively high R-squared value.

II. Influenza vaccination rates as explained by prevalence and mortality

Model 4: $\hat{{FluVax}_{Children}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 5: $\hat{{FluVax}_{All healthcare}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 6: $\hat{{FluVax}_{Healthcare employees}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 7: $\hat{{FluVax}_{indep practicioners}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 8: $\hat{{FluVax}_{Pregnant 18-24yrs}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 9: $\hat{{FluVax}_{Pregnant over 18yrs}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model 10: $\hat{{FluVax}_{All}} = \hat{\beta}_{1}X_{fluprevalence} + \hat{\beta}_{2}X_{flumortality}$

Model Coefficients:

P-values of Coefficients:

P-values of overall fit:

R-squared values:

Based on the p-values for the overall fit, all models are statistically significant, and all have relatively high R squared values. In addition, flu mortality has a significant positive association with a flu vaccination rates, while flu prevalence does not.

This is a very interesting finding, as we can determine that across all subpopulations, high Coronavirus prevalence rates are associated with high Coronavirus vaccination rates, however high Influenza mortality rates are associated with high Influenza vaccination rates. This indicates that there is likely a fundamental difference between an individual's outlook of the Coronavirus and Influenza vaccines and their motivation to get these vaccines.

Conclusion

This analysis provided a detailed comparison of the spread and outlook of Coronavirus and Influenza, through the lens of vaccination, prevalence and mortality statistics. Subpopulation analyses determined groups that had different association between these three variables. The three most important and interesting findings of this analysis are the following:

To improve this analysis in the future, I would recommend locating a more detailed and exact influenza dataset, if this is possible. In addition, if we were to find this dataset with similar subpopulations as the CDC Covid dataset, we could more specifically determine the the within-sample effect.

References

Centers for Disease Control and Prevention. (2022, January 21). Weekly US map: Influenza summary update. Centers for Disease Control and Prevention. Retrieved January 27, 2022, from https://www.cdc.gov/flu/weekly/usmap.htm

Centers for Disease Control and Prevention. (2022, January 5). CDC Museum Covid-19 Timeline. Centers for Disease Control and Prevention. Retrieved January 27, 2022, from https://www.cdc.gov/museum/timeline/covid19.html

Centers for Disease Control and Prevention. (n.d.). CDC Covid Data tracker. Centers for Disease Control and Prevention. Retrieved January 27, 2022, from https://covid.cdc.gov/covid-data-tracker/#datatracker-home

Centers for Disease Control and Prevention. (n.d.). Selected adverse events reported after COVID-19 vaccination. Centers for Disease Control and Prevention. Retrieved January 27, 2022, from https://www.cdc.gov/coronavirus/2019-ncov/vaccines/safety/adverse-events.html

Gur-Arie, R., Jamrozik, E., & Kingori, P. (2021). No Jab, No Job? Ethical Issues in Mandatory COVID-19 Vaccination of Healthcare Personnel. BMJ global health, 6(2), e004877. https://doi.org/10.1136/bmjgh-2020-004877

Pardi, N., Hogan, M. J., Porter, F. W., & Weissman, D. (2018). mRNA vaccines - a new era in vaccinology. Nature reviews. Drug discovery, 17(4), 261–279. https://doi.org/10.1038/nrd.2017.243

Wibawa T. (2021). COVID-19 vaccine research and development: ethical issues. Tropical medicine & international health : TM & IH, 26(1), 14–19. https://doi.org/10.1111/tmi.13503